DCP-NAS: Discrepant Child-Parent Neural Architecture Search for 1-Bit CNNs

115

TABLE 4.2

Effect of with/without the reconstruction error and the

tangent direction constraint on the ImageNet data set. The

architecture used for the experiments is DCP-NAS-L.

Tangent direction (Dα))









Reconstruction error (LR( ˆw, β))









Accuracy

Top-1

66.7

68.3

68.2

72.4

Top-5

83.3

85.0

85.1

89.2

used for both parent and child models. When applied to the Child model, the w here denotes

the reconstructed weights from the binarized weights, that is,, w = βb ˆw.

4.4.7

Ablation Study

Effectiveness of Tangent Propagation In this section, we evaluate the effects of the

tangent propagation on the performance of DCP-NAS, the hyperparameter used in this

section includes λ, μ. Furthermore, we also discuss the effectiveness of the reconstruction

error. The implementation details are given below.

For searching for a better binary neural architecture, λ and μ are used to balance the

KL divergence ˜f( ˆw, ˆα, β) to supervise the Child, the reconstruction error for binary weights

LR( ˆw, β) and the constraint in the tangent direction Dα). We evaluated λ and μ on the

ImageNet data set with the DCP-NAS-L architecture. To better understand tangent prop-

agation on the large-scale ImageNet ILSVRC12 dataset, we experimented to examine how

the tangent direction constraint affects performance. Based on the experiments described

above, we first set λ to 5e3 and μ to 0.2 if they are used. As shown in Table 4.2, both

FIGURE 4.15

With different λ and μ, we evaluated the Top-1 accuracies of DCP-NAS-L

on ImageNet.